Thanks go to Dr. Phil Bradley at the Fred Hutchinson Cancer Research Center for helping me come up with the idea for this project and helping me troubleshoot it.
While sick in bed, the American biochemist Linus Pauling started coming up with the structure of the alpha helix, one of the most common secondary protein structures. His structure turned out to be correct, and it is found in many proteins. This final project as about playing Linus Pauling in that the goal is to create an algorithm that correctly folds an amino acid sequence into an alpha-helix.
Proteins are critical for life as we know it. They are made up of individual amino acids that are joined together in peptide bonds. There are a total of 20 common amino acids, each of which have a backbone that is identical between amino acids and a unique sidechain that is unique to the amino acid (fig. 1). These peptides then fold into 3-D functional proteins.
The structure of a protein determines its function. The structure in turn is determined by the amino acid sequence of a protein. Protein structures are often hard to determine (though it can be done using methods like X-ray crystallography, nuclear magnetic resonance (NMR) or Cryo-EM). However, this is a time consuming process, and sometimes it is not possible to get a structure.
Figure 1. Schematic of a polypeptide chain. Each box depicts the basic chemical structure of an amino acid. Peptide bonds between amino acids are highlighted in blue. This basic structure is the same across all amino acids, the difference between individual amino acids is the chemical makeup of the side chain (also referred to as the R group), depicted in green.
It is often interesting to predict the effect of single amino acid changes on the structure of a protein, as this would give us clues about the functional effect of a mutation (e.g. in disease). However, it is probably needless to say, that there are more interesting experiments to conduct than resources to determine structures using real life experiments. Consequently, predicting the structure of a protein in-silico is an attractive alternative. However, protein folding is an NP-hard problem (Pierce, N. A., & Winfree, E., 2002), and 3D protein structures have yet to be predicted with the same accuracy as experimental methods.
Protein structure is divided into four levels: primary to quaternary structure. Primary structure refers to the linear sequence of amino acids that characterizes a peptide chain. Secondary structure refers to the local three dimensional structure of a segment of protein and is created by interactions of the amino acid backbones. The most common secondary structures are alpha-helices and beta-sheets. Secondary structures will also be the main subject of this simulation. Tertiary structures refer to the 3D folding of a protein that is created by the interaction of the individual amino acids sidechains. Finally, quaternary structure refers to the arrangement of protein subunits and their interactions.
Here I will start with a sequence that has been determined to fold correctly into an alpha-helix (by X-ray crystallography), perturb it, and then use an algorithm to make it fold back to its original confirmation. But first, some more background...
Molecules always tend toward their lowest energy state. This is also true for proteins, an unfolded protein in solution will wiggle around and take on different confirmations until it reaches its lowest energy confirmation (though it can get stuck in local minima). There are several forces at play here, I have described some of them below:
Other forces that will influence the stability of a protein are disulfide bonds, the solution a protein is in, salt concentration, pH and temperature. Protein folding simulations need to take these forces into account by approximating them as part of their objective function. Running a protein folding simulation can in itself be informative for our understanding of how proteins fold in nature. For example, if it is possible to get a protein to fold in silico using only van der Waals forces, we know that van der Waals forces are indeed an important component for proteins to be able to fold. If it is not possible to fold a protein in silico with only van der Waals forces being a part of the objective function, then we might conclude that there are more components to protein folding in nature. Put another way, simulations are an efficient way to test models and hypotheses we may have on how proteins fold.
Overall, the goal will be to minimize the energy as calculated by the objective function.
PyRosetta is a python based library that allows users to interact with Rosetta, a protein modelling suite that can be used for protein design and to calculate energy functions of protein conformations (Chaudhury, S. et al, 2010). This simulation will require PyRosetta to calculate the energy state of a protein, (this involves calculating several interaction forces between all of the atoms in a protein structure).
In addition to PyRosetta the following packages are required for this simulation to work, links are to websites that give instructions for their installation:
Additionally, for this to work in jupyter notebooks, widgetsnbextension, ipywidgets and ipython need to be installed. For installation tips/troubleshooting of ssbio take a look at the following website: https://github.com/SBRG/ssbio/wiki/Troubleshooting#nglviewer-fresh-install-tips
Note that PyRosetta (and maybe some of the other libraries) are notoriously hard to install on Windows machines, as a Linux system is required to get pyrosetta running. To install on windows it is necessary to install Ubuntu as a subsystem (described in the pyrosetta installation guide).
# Load and initialize all required packages
from pyrosetta import *
from pyrosetta.toolbox import *
init() # this is to initialize pyrosetta and to make sure it is working
import nglview as nv
import random
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
The alpha helix I will be working with is from the glucagon hormone (this can be accessed on PDB with the following accession: 1gcn, or here: https://www.rcsb.org/structure/1gcn). This structure was determined using X-ray crystallography. Before storing and loading the file I removed any atoms that were not part of the alpha-helix section using a text editor.
Additionally, the following structures are also in this folder and can be loaded by changing the accession code in the pose_from_pdb() function. I ran this simulation for all of the structures below with similar results (though the glucagon structure is the one I tested the most). All structures were chosen such that they have no or few prolines and glycines, as these typically tend to disrupt alpha helices.
The correctly folded glucagon alpha-helix is visualized below. I will start by scrambling up all amino acid backbone angles, then I will attempt to have the computer recreate the original fold. The coloured ribbon represents the protein's backbone, while the balls and sticks represent the atoms of the individual side chains.
Note that with nglview it is possible to rotate the structure and zoom in on certain parts. This is only possible if one runs the notebook, otherwise the output of a section of code that is meant to visualize the protein will only say NGLWidget(). Nonetheless, I have included static images of every important aspect of the simulation.
# Load the structure
helix = pose_from_pdb("1gcn.pdb")
view = nv.show_rosetta(helix)
view.add_ball_and_stick()
view

The amino acid sequence of the above helix is shown below in one letter abbreviations (e.g. L is a leucine, and A is an alanine). For a full key please consult the following website: https://molbiol-tools.ca/Amino_acid_abbreviations.htm
helix.sequence()
As mentioned, proteins will tend to favour lower energy conformations. For a simulation to find the lower energy conformations, the total energy needs to be estimated. There are several models that could be used. The most simple one is probably the Lennard Jones (LJ) potential model.
The LJ potential describes the potential energy interaction between two non-bonded atoms that are interacting with van der Waal forces. The total potential energy of the interaction depends on the distance between the two atoms. When two atoms are an infinite distance apart, they are neither attracting nor repelling each other. As they move closer, they will start attracting each other, until a point when they get too close to each other, at which point they will start repelling each other because they begin to invade each other's space.
Thus, mathematically, the LJ potential consists of two terms, an attractive term and a repelling term. This can be expressed as follows:
Where
Note that the values for $\epsilon$ and $\sigma$ change depending on the atoms in question and they have to be experimentally determined. If one is trying to calculate the LJ potential between two non-identical atoms things have to be averaged out.
Figure 2. Lennard Jones potential curve showing the potential energy on the y-axis an the distance between two atoms on the x-axis. (Image source: Chem Libre Texts).
PyRosetta allows us to set up a scoring function with just these two terms. This is done below, and the total potential energy for the native alpha helix is calculated. Note that fa_atr stands for full atom attraction, and fa_rep stands for full atom repulsion. Each component can also be given a weight. When the scores are printed out, the raw (unweighted score) and the weighted score are printed, as well as the total score of both components combined.
The following lists the score of the correctly folded conformation using the LJ potential function.
scorefxn = ScoreFunction()
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_atr, 1.0)
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_rep, 0.5)
print(scorefxn.show(helix))
Alpha helices are stabilized by hydrogen bonds between atoms of the backbones. Thus, for the purposes of this simulation I will give extra weight to these interactions in some instances.
Additionally, to reduce the complexity of this assignment I will only change the angles of the backbones. There are three different angles that I could manipulate. The two angles between backbone atoms (phi and psi angles, fig. 3), or the angles at which the side chains (R-groups) are attached (these are the ones I will not change). Besides reducing complexity, changing these angles should not be necessary to get an alpha-helix to fold, as this structure does not involve interactions between the sidechains (in theory). Since the sidechain angles are never changed, they will remain in the correct conformation from the moment the correct structure is read in.
Figure 3. Depiction of a protein backbone with the phi $\phi$ and $\psi$ angles labeled. Image source: Wikipedia.
Consequently, I will also remove any effect of the sidechains on the energy calculations for more complex energy calculations (this is done by removing the Dunbrack term).
As mentioned, only the phi and psi angles will be changed. At the start of each move, a random residue will be selected that will subsequently be changed. To ensure that the entire state-space can be explored, the algorithm selects from one of the following options: change a phi angle, change a psi angle, or change both. Each option will be chosen with an equal probability. The reason for all three options is that it is concievable that the simulation may get stuck in a local minimum and can only get out by changing both angles simultaneously. A new randomly chosen value (between 0 and 360) is then assigned to the angle(s) and the score is returned.
def changeConfirmation(struc):
# Select a random residue in the structure
res = random.randint(0, struc.total_residue() - 1)
# Randomly pick an angle to change it to
angle = random.randint(0, 360)
# this variable defines if the phi or the psi angle is being changed
phi_or_psi = random.randint(0, 2)
if phi_or_psi == 0:
struc.set_phi(res + 1, angle)
elif phi_or_psi == 1:
struc.set_phi(res + 1, angle)
else:
angle2 = random.randint(0, 360)
struc.set_phi(res + 1, angle)
struc.set_phi(res + 1, angle2)
score = scorefxn(helix)
return(score)
I will be evaluating my folding algorithm based on two metrics: the energy state of the folded peptide (and its comparison to the energy state of the native peptide), as well as the average distance between two atoms. The energy state of the folded peptide will be calculated using the energy function introduced above. The average distance between two atoms or root-mean-square deviation (RMSD) is calculated as follows.
Where $n$ is the number of atoms, and $v$ and $w$ are any two of them. $x$, $y$ and $z$ refer to the three spatial dimensions. Essentially, this gives us a normalized sum of the total pythagorean distance between all atoms.
First I will create a function that I can call every time to evaluate whether a change should be accepted or not. If the change results in a lower score, it is automatically accepted. If it is worse, it will only be accepted based on the Metropolis criterion (if the change in energy is favourable, the change is always accepted, otherwise it is only accepted with a certain probability. If the change in energy is drastic, it is less likely to be accepted. Additionally, worse changes are less likely to be accepted over time).
def foldIt(struc, temp):
# Get the current score
current_score = scorefxn(struc)
# Get the current rmsd value
current_rmsd = pyrosetta.rosetta.core.scoring.CA_rmsd(native, helix)
# append both to the respective dataframes
energy_scores.append(current_score)
rmsd_scores.append(current_rmsd)
# Create a deepcopy permutation of the structure that I can then manipulate
temp_helix = Pose()
temp_helix.assign(struc)
# Make a change to the temporary confirmation
changeConfirmation(temp_helix)
# Calculate the score of the new permutation
new_score = scorefxn(temp_helix)
# If the score is better accept the change
if new_score < current_score:
struc = temp_helix
# If the score is worse only accept the change with a certain probability that is dependent on the magnitude of the
# change. I.e. a bigger change in energy makes it less likely to be accepted
else:
delta = abs(new_score - current_score)
prob = math.exp(-delta/temperature)
test_prob = random.uniform(0, 1)
if(test_prob < prob):
struc = temp_helix
return(struc)
I will begin this simulation with the simplest scoring model: the Lennard Jones potential function and evaluate it. As described above, the energy of the native, correctly folded confirmation is ~ -26. At the start of every simulation I will change all backbone angles to 180 degrees such that 1) the structure is not an alpha-helix anymore, and 2) I always start out with the same unfolded structure (as opposed to randomly changing all angles). This is what the unfolded confirmation will look like:
# Change the folded helix structure such that all angles are 180 degrees
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
view = nv.show_rosetta(helix)
view

# Choose a temperature and cooling rate
temperature = 1000
coolingRate = 0.05
# Save the native conformation so that the RMSD values can be calculated
native = pose_from_pdb("1gcn.pdb")
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
# And the corresponding data frames to save energy and RMSD values throughout the simulation
energy = pd.DataFrame()
rmsd = pd.DataFrame()
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
view = nv.show_rosetta(helix)
view

Next, let us explore the results from this simulation. The final structure is clearly far from the original structure, and the energy of it is also much higher. This is also reflected in the RMSD values, as an RMSD of ~7 is too far from 0 to still resemble an alpha helix (a value of <2 would likely still reflect some form of alpha helix, see the end of this notebook for further analysis on this aspect).
Based on the graphs below we can also see that the simulation stops changing energy and RMSD values after a certain number of iterations, indicating that the simulation hit an equilibrium.
energy.plot(kind='line', figsize=(20,3))
plt.title('Leonard-Jones Potential Energy', fontsize = 20)
plt.ylabel('LJ-Potential', fontsize = 15)
plt.xlabel('Iterations', fontsize = 15)
rmsd.plot(kind='line', figsize=(20,3))
plt.title('RMSD Values', fontsize = 20)
plt.ylabel('RMSD Values', fontsize = 15)
plt.xlabel('Iterations', fontsize = 15)
We can also look at a scatterplot showing the correlation between RMSD and LJ-potential values. If everything worked, we should see a correlation, the higher the energy, the higher the RMSD value should be.
plt.scatter(rmsd, energy, alpha=0.5)
The peptide could not have folded correctly because the hyper parameters are not tuned right, thus I will next run a brute force search for better hyper parameters. The following arrays will contain all the temperatures and cooling rates that I will test.
# Set up hyperparameters to be searched
temperature_array = [100, 1000, 10000]
coolingRate_array = [0.01, 0.001, 0.0001]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
expName = str(temperature) + "_" + str(coolingRate)
names.append(expName)
# Initialize the helix to have angles of 180 degrees again
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
The following are plots showing the energy of the peptide as the simulation progresses. This simulation was still done with the LJ-potential scoring function. The temperature and cooling rate for each simulation are indicated at the top of the sub-figure.
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(energy)), energy['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('LJ-potential', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('LJ-potential', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('LJ-potential', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('LJ-potential', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('LJ-potential', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('LJ-potential', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('LJ-potential', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('LJ-potential', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('LJ-potential', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The final energies for each simulation are listed below:
print("Temperature: 100; cooling rate: 0.01-- ", energy['100_0.01'][energy['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", energy['100_0.001'][energy['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", energy['100_0.0001'][energy['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", energy['1000_0.01'][energy['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", energy['1000_0.001'][energy['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", energy['1000_0.0001'][energy['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", energy['10000_0.01'][energy['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", energy['10000_0.001'][energy['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", energy['10000_0.0001'][energy['10000_0.0001'].last_valid_index()])
What follows are the RMSD plots:
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(rmsd)), rmsd['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The final RMSD values are listed below:
print("Temperature: 100; cooling rate: 0.01-- ", rmsd['100_0.01'][rmsd['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", rmsd['100_0.001'][rmsd['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", rmsd['100_0.0001'][rmsd['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", rmsd['1000_0.01'][rmsd['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", rmsd['1000_0.001'][rmsd['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", rmsd['1000_0.0001'][rmsd['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", rmsd['10000_0.01'][rmsd['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", rmsd['10000_0.001'][rmsd['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", rmsd['10000_0.0001'][rmsd['10000_0.0001'].last_valid_index()])
And finally, the following are the scatterplots showing the correlation between the energy and RMSD values for each hyperparameter pair.
Rows 1 - 3 correspond to temperatures 100, 1,000 and 10,000, while columns 1 - 3 correspond to cooling rates of 0.01, 0.001 and 0.0001. The first plot in row one and column one thus shows the simulation with the following hyperparameters: temperature 100 and cooling rate 0.01.
f = plt.figure()
f, axes = plt.subplots(nrows = 3, ncols = 3, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['100_0.01'], energy['100_0.01'], alpha = 0.3)
axes[0][1].scatter(rmsd['100_0.001'], energy['100_0.001'], alpha = 0.3)
axes[0][2].scatter(rmsd['100_0.0001'], energy['100_0.0001'], alpha = 0.3)
axes[1][0].scatter(rmsd['1000_0.01'], energy['1000_0.01'], alpha = 0.3)
axes[1][1].scatter(rmsd['1000_0.001'], energy['1000_0.001'], alpha = 0.3)
axes[1][2].scatter(rmsd['1000_0.0001'], energy['1000_0.0001'], alpha = 0.3)
axes[2][0].scatter(rmsd['10000_0.01'], energy['10000_0.01'], alpha = 0.3)
axes[2][1].scatter(rmsd['10000_0.001'], energy['10000_0.001'], alpha = 0.3)
axes[2][2].scatter(rmsd['10000_0.0001'], energy['10000_0.0001'], alpha = 0.3)
plt.show()
Clearly all of these simulations end up with final structures that not the original alpha-helix. Maybe a different scoring function will do better, this time I will include the backbone hydrogen bonds, as they are largely the ones involved in stabilizing alpha helices. I will also set the weight of this parameter fairly high (20) to ensure it has a big effect.
scorefxn = ScoreFunction()
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_atr, 1.0)
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_rep, 0.5)
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.hbond_sr_bb, 20)
# Check the score of the folded protein with this new function
helix = pose_from_pdb("1gcn.pdb")
print(scorefxn.show(helix))
Next I will try the exact same simulation as above including the brute force hyper-parameter search using the new scoring function that contains the backbone hydrogen component
# Set up hyperparameters to be searched
temperature_array = [100, 1000, 10000]
coolingRate_array = [0.01, 0.001, 0.0001]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
expName = str(temperature) + "_" + str(coolingRate)
names.append(expName)
# Initialize the helix to have angles of 180 degrees again
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
The following are the energy values for the simulation above.
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(energy)), energy['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('LJ-potential', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('LJ-potential', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('LJ-potential', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('LJ-potential', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('LJ-potential', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('LJ-potential', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('LJ-potential', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('LJ-potential', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('LJ-potential', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The following are the final energy values for each simulation
print("Temperature: 100; cooling rate: 0.01-- ", energy['100_0.01'][energy['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", energy['100_0.001'][energy['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", energy['100_0.0001'][energy['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", energy['1000_0.01'][energy['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", energy['1000_0.001'][energy['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", energy['1000_0.0001'][energy['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", energy['10000_0.01'][energy['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", energy['10000_0.001'][energy['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", energy['10000_0.0001'][energy['10000_0.0001'].last_valid_index()])
The following are the RMSD values for the same simulation:
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(rmsd)), rmsd['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The following are the final RMSD values for each simulation
print("Temperature: 100; cooling rate: 0.01-- ", rmsd['100_0.01'][rmsd['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", rmsd['100_0.001'][rmsd['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", rmsd['100_0.0001'][rmsd['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", rmsd['1000_0.01'][rmsd['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", rmsd['1000_0.001'][rmsd['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", rmsd['1000_0.0001'][rmsd['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", rmsd['10000_0.01'][rmsd['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", rmsd['10000_0.001'][rmsd['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", rmsd['10000_0.0001'][rmsd['10000_0.0001'].last_valid_index()])
And finally, the scatterplots. Each row and column still correspond to the same energy and cooling rate values.
f = plt.figure()
f, axes = plt.subplots(nrows = 3, ncols = 3, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['100_0.01'], energy['100_0.01'], alpha = 0.3)
axes[0][1].scatter(rmsd['100_0.001'], energy['100_0.001'], alpha = 0.3)
axes[0][2].scatter(rmsd['100_0.0001'], energy['100_0.0001'], alpha = 0.3)
axes[1][0].scatter(rmsd['1000_0.01'], energy['1000_0.01'], alpha = 0.3)
axes[1][1].scatter(rmsd['1000_0.001'], energy['1000_0.001'], alpha = 0.3)
axes[1][2].scatter(rmsd['1000_0.0001'], energy['1000_0.0001'], alpha = 0.3)
axes[2][0].scatter(rmsd['10000_0.01'], energy['10000_0.01'], alpha = 0.3)
axes[2][1].scatter(rmsd['10000_0.001'], energy['10000_0.001'], alpha = 0.3)
axes[2][2].scatter(rmsd['10000_0.0001'], energy['10000_0.0001'], alpha = 0.3)
plt.show()
Clearly adding the backbone hydrogen bond term to the energy function did not significantly improve the simulation results. Thus I will now use all of the parameters available through PyRosetta (with the exception of the dunbrack term, as this is for scoring the sidechaing which are in the original confirmation.
# set up the scoring function
scorefxn = get_fa_scorefxn()
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_dun, 0) # set the dunbrack term to zero
Find the total energy of the folded protein using this scoring function.
helix = pose_from_pdb("1gcn.pdb")
print(scorefxn.show(helix))
Run the same simulation again with the same brute force hyper-parameter search.
# Set up hyperparameters to be searched
temperature_array = [100, 1000, 10000]
coolingRate_array = [0.01, 0.001, 0.0001]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
expName = str(temperature) + "_" + str(coolingRate)
names.append(expName)
# Initialize the helix to have angles of 180 degrees again
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
The following are the energies for the above simulation.
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(energy)), energy['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('LJ-potential', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('LJ-potential', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('LJ-potential', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('LJ-potential', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('LJ-potential', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('LJ-potential', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('LJ-potential', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('LJ-potential', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('LJ-potential', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The following are the final values for the energy scores of the above simulation
print("Temperature: 100; cooling rate: 0.01-- ", energy['100_0.01'][energy['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", energy['100_0.001'][energy['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", energy['100_0.0001'][energy['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", energy['1000_0.01'][energy['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", energy['1000_0.001'][energy['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", energy['1000_0.0001'][energy['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", energy['10000_0.01'][energy['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", energy['10000_0.001'][energy['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", energy['10000_0.0001'][energy['10000_0.0001'].last_valid_index()])
The following are the RMSD values corresponding to the above simulation.
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(rmsd)), rmsd['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The following are the final RMSD scores for the above simulation
print("Temperature: 100; cooling rate: 0.01-- ", rmsd['100_0.01'][rmsd['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", rmsd['100_0.001'][rmsd['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", rmsd['100_0.0001'][rmsd['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", rmsd['1000_0.01'][rmsd['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", rmsd['1000_0.001'][rmsd['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", rmsd['1000_0.0001'][rmsd['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", rmsd['10000_0.01'][rmsd['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", rmsd['10000_0.001'][rmsd['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", rmsd['10000_0.0001'][rmsd['10000_0.0001'].last_valid_index()])
The RMSD-energy scatterplots for the simulation above is shown below.
f = plt.figure()
f, axes = plt.subplots(nrows = 3, ncols = 3, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['100_0.01'], energy['100_0.01'], alpha = 0.3)
axes[0][1].scatter(rmsd['100_0.001'], energy['100_0.001'], alpha = 0.3)
axes[0][2].scatter(rmsd['100_0.0001'], energy['100_0.0001'], alpha = 0.3)
axes[1][0].scatter(rmsd['1000_0.01'], energy['1000_0.01'], alpha = 0.3)
axes[1][1].scatter(rmsd['1000_0.001'], energy['1000_0.001'], alpha = 0.3)
axes[1][2].scatter(rmsd['1000_0.0001'], energy['1000_0.0001'], alpha = 0.3)
axes[2][0].scatter(rmsd['10000_0.01'], energy['10000_0.01'], alpha = 0.3)
axes[2][1].scatter(rmsd['10000_0.001'], energy['10000_0.001'], alpha = 0.3)
axes[2][2].scatter(rmsd['10000_0.0001'], energy['10000_0.0001'], alpha = 0.3)
plt.show()
Clearly both the energy values and the RMSD values are still far from the native structure. In the scoring function above I used the default weight for hydrogen bonds between atoms of the backbones. I will now try to use the exact same function but increase the weight of the backbone hydrogen bonds to 20. The dunbrack term will remain zero.
# Set up scoring function
scorefxn = get_fa_scorefxn()
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_dun, 0) # Set dunbrack term to zero
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.hbond_sr_bb, 20) # Increase importance of the backbone h-bonds
# Find value of the native structure associated with this scoring function
helix = pose_from_pdb("1gcn.pdb")
print(scorefxn.show(helix))
Run the same simulation as above with the same brute force hyper-parameter search.
# Set up hyperparameters to be searched
temperature_array = [100, 1000, 10000]
coolingRate_array = [0.01, 0.001, 0.0001]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
expName = str(temperature) + "_" + str(coolingRate)
names.append(expName)
# Initialize the helix to have angles of 180 degrees again
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
The energy plots for the above simulations:
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(energy)), energy['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('LJ-potential', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('LJ-potential', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('LJ-potential', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('LJ-potential', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('LJ-potential', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('LJ-potential', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('LJ-potential', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('LJ-potential', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('LJ-potential', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The final energy scores for each simulation are shown below:
print("Temperature: 100; cooling rate: 0.01-- ", energy['100_0.01'][energy['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", energy['100_0.001'][energy['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", energy['100_0.0001'][energy['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", energy['1000_0.01'][energy['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", energy['1000_0.001'][energy['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", energy['1000_0.0001'][energy['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", energy['10000_0.01'][energy['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", energy['10000_0.001'][energy['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", energy['10000_0.0001'][energy['10000_0.0001'].last_valid_index()])
The final RMSD scores for each simulation are shown below:
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9) = plt.subplots(9, 1, figsize=(20,60))
ax1.plot(range(len(rmsd)), rmsd['100_0.01'])
ax1.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15)
ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['100_0.001'])
ax2.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15)
ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['100_0.0001'])
ax3.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15)
ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['1000_0.01'])
ax4.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15)
ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1000_0.001'])
ax5.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15)
ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1000_0.0001'])
ax6.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15)
ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['10000_0.01'])
ax7.set_title('Temperature: 10000; cooling rate: 0.01', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15)
ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['10000_0.001'])
ax8.set_title('Temperature: 10000; cooling rate: 0.001', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15)
ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10000_0.0001'])
ax9.set_title('Temperature: 10000; cooling rate: 0.0001', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15)
ax9.set_xlabel('Iterations', fontsize = 15)
The final RMSD values for each simulation are listed below:
print("Temperature: 100; cooling rate: 0.01-- ", rmsd['100_0.01'][rmsd['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001-- ", rmsd['100_0.001'][rmsd['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001-- ", rmsd['100_0.0001'][rmsd['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01-- ", rmsd['1000_0.01'][rmsd['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001-- ", rmsd['1000_0.001'][rmsd['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001-- ", rmsd['1000_0.0001'][rmsd['1000_0.0001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.01-- ", rmsd['10000_0.01'][rmsd['10000_0.01'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.001-- ", rmsd['10000_0.001'][rmsd['10000_0.001'].last_valid_index()])
print("Temperature: 10000; cooling rate: 0.0001-- ", rmsd['10000_0.0001'][rmsd['10000_0.0001'].last_valid_index()])
RMSD-energy scatterplots for the above simulation
f = plt.figure()
f, axes = plt.subplots(nrows = 3, ncols = 3, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['100_0.01'], energy['100_0.01'], alpha = 0.3)
axes[0][1].scatter(rmsd['100_0.001'], energy['100_0.001'], alpha = 0.3)
axes[0][2].scatter(rmsd['100_0.0001'], energy['100_0.0001'], alpha = 0.3)
axes[1][0].scatter(rmsd['1000_0.01'], energy['1000_0.01'], alpha = 0.3)
axes[1][1].scatter(rmsd['1000_0.001'], energy['1000_0.001'], alpha = 0.3)
axes[1][2].scatter(rmsd['1000_0.0001'], energy['1000_0.0001'], alpha = 0.3)
axes[2][0].scatter(rmsd['10000_0.01'], energy['10000_0.01'], alpha = 0.3)
axes[2][1].scatter(rmsd['10000_0.001'], energy['10000_0.001'], alpha = 0.3)
axes[2][2].scatter(rmsd['10000_0.0001'], energy['10000_0.0001'], alpha = 0.3)
plt.show()
So far, none of the methods have correctly folded the peptide. This could be because the folded conformation is in a narrow valley/hole in the multidimensional landscape we are dealing with. To get a better sense for this I will start at the correctly folded conformation, run the algorithm with different temperatures and cooling rates and visualize how hard it is to return to the original conformation.
In theory, this should make a few changes that increase the energy score but then reverse them and return to a conformation that is similar to the original one.
# Load the original structure again
helix = pose_from_pdb("1gcn.pdb")
view = nv.show_rosetta(helix)
view

For this part I will use the same scoring function as above. (All PyRosetta parameters, no dunbrack term, and a weight of 20 for the backbone hydrogen bonds).
# Define the scoring function again
scorefxn = get_fa_scorefxn()
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.fa_dun, 0)
scorefxn.set_weight(pyrosetta.rosetta.core.scoring.ScoreType.hbond_sr_bb, 20)
print(scorefxn.show(helix))
# Set up hyperparameters to be searched
temperature_array = [0.1, 1, 10, 100, 1000]
coolingRate_array = [0.1, 0.01, 0.001, 0.0001]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
expName = str(temperature) + "_" + str(coolingRate)
names.append(expName)
# Return to native structure
helix = pose_from_pdb("1gcn.pdb")
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt(helix, temperature)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16, ax17, ax18, ax19, ax20) = plt.subplots(20, 1, figsize=(20,130))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(energy)), energy['0.1_0.1'])
ax1.set_title('Temperature: 0.1; cooling rate: 0.1', fontsize = 20)
ax1.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['0.1_0.01'])
ax2.set_title('Temperature: 0.1; cooling rate: 0.01', fontsize = 20)
ax2.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['0.1_0.001'])
ax3.set_title('Temperature: 0.1; cooling rate: 0.001', fontsize = 20)
ax3.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['0.1_0.0001'])
ax4.set_title('Temperature: 0.1; cooling rate: 0.0001', fontsize = 20)
ax4.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1_0.1'])
ax5.set_title('Temperature: 1; cooling rate: 0.1', fontsize = 20)
ax5.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1_0.01'])
ax6.set_title('Temperature: 1; cooling rate: 0.01', fontsize = 20)
ax6.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['1_0.001'])
ax7.set_title('Temperature: 1; cooling rate: 0.001', fontsize = 20)
ax7.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['1_0.0001'])
ax8.set_title('Temperature: 1; cooling rate: 0.0001', fontsize = 20)
ax8.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10_0.1'])
ax9.set_title('Temperature: 10; cooling rate: 0.1', fontsize = 20)
ax9.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(energy)), energy['10_0.01'])
ax10.set_title('Temperature: 10; cooling rate: 0.01', fontsize = 20)
ax10.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(energy)), energy['10_0.001'])
ax11.set_title('Temperature: 10; cooling rate: 0.001', fontsize = 20)
ax11.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(energy)), energy['10_0.0001'])
ax12.set_title('Temperature: 10; cooling rate: 0.0001', fontsize = 20)
ax12.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(energy)), energy['100_0.1'])
ax13.set_title('Temperature: 100; cooling rate: 0.1', fontsize = 20)
ax13.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(energy)), energy['100_0.01'])
ax14.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax14.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(energy)), energy['100_0.001'])
ax15.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax15.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(energy)), energy['100_0.0001'])
ax16.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax16.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax17.plot(range(len(energy)), energy['1000_0.1'])
ax17.set_title('Temperature: 1000; cooling rate: 0.1', fontsize = 20)
ax17.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax18.plot(range(len(energy)), energy['1000_0.01'])
ax18.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax18.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax19.plot(range(len(energy)), energy['1000_0.001'])
ax19.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax19.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax20.plot(range(len(energy)), energy['1000_0.0001'])
ax20.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax20.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
The final energies of each simulation are as follows:
# Final values for each simulation
print("Temperature: 0.1; cooling rate: 0.1", energy['0.1_0.1'][energy['0.1_0.1'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.01", energy['0.1_0.01'][energy['0.1_0.01'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.001", energy['0.1_0.001'][energy['0.1_0.001'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.0001", energy['0.1_0.0001'][energy['0.1_0.0001'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.1", energy['1_0.1'][energy['1_0.1'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01", energy['1_0.01'][energy['1_0.01'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001", energy['1_0.001'][energy['1_0.001'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.0001", energy['1_0.0001'][energy['1_0.0001'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.1", energy['10_0.1'][energy['10_0.1'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01", energy['10_0.01'][energy['10_0.01'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001", energy['10_0.001'][energy['10_0.001'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.0001", energy['10_0.0001'][energy['10_0.0001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.1", energy['100_0.1'][energy['100_0.1'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01", energy['100_0.01'][energy['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001", energy['100_0.001'][energy['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001", energy['100_0.0001'][energy['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.1", energy['1000_0.1'][energy['1000_0.1'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01", energy['1000_0.01'][energy['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001", energy['1000_0.001'][energy['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001", energy['1000_0.0001'][energy['1000_0.0001'].last_valid_index()])
The following are the simulations RMSD values
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16, ax17, ax18, ax19, ax20) = plt.subplots(20, 1, figsize=(20,120))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(rmsd)), rmsd['0.1_0.1'])
ax1.set_title('Temperature: 0.1; cooling rate: 0.1', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['0.1_0.01'])
ax2.set_title('Temperature: 0.1; cooling rate: 0.01', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15); ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['0.1_0.001'])
ax3.set_title('Temperature: 0.1; cooling rate: 0.001', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15); ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['0.1_0.0001'])
ax4.set_title('Temperature: 0.1; cooling rate: 0.0001', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15); ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1_0.1'])
ax5.set_title('Temperature: 1; cooling rate: 0.1', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15); ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1_0.01'])
ax6.set_title('Temperature: 1; cooling rate: 0.01', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15); ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['1_0.001'])
ax7.set_title('Temperature: 1; cooling rate: 0.001', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15); ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['1_0.0001'])
ax8.set_title('Temperature: 1; cooling rate: 0.0001', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15); ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10_0.1'])
ax9.set_title('Temperature: 10; cooling rate: 0.1', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15); ax9.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(rmsd)), rmsd['10_0.01'])
ax10.set_title('Temperature: 10; cooling rate: 0.01', fontsize = 20)
ax10.set_ylabel('RMSD', fontsize = 15); ax10.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(rmsd)), rmsd['10_0.001'])
ax11.set_title('Temperature: 10; cooling rate: 0.001', fontsize = 20)
ax11.set_ylabel('RMSD', fontsize = 15); ax11.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(rmsd)), rmsd['10_0.0001'])
ax12.set_title('Temperature: 10; cooling rate: 0.0001', fontsize = 20)
ax12.set_ylabel('Energy', fontsize = 15); ax12.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(rmsd)), rmsd['100_0.1'])
ax13.set_title('Temperature: 100; cooling rate: 0.1', fontsize = 20)
ax13.set_ylabel('RMSD', fontsize = 15); ax13.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(rmsd)), rmsd['100_0.01'])
ax14.set_title('Temperature: 100; cooling rate: 0.01', fontsize = 20)
ax14.set_ylabel('RMSD', fontsize = 15); ax14.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(rmsd)), rmsd['100_0.001'])
ax15.set_title('Temperature: 100; cooling rate: 0.001', fontsize = 20)
ax15.set_ylabel('RMSD', fontsize = 15); ax15.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(rmsd)), rmsd['100_0.0001'])
ax16.set_title('Temperature: 100; cooling rate: 0.0001', fontsize = 20)
ax16.set_ylabel('RMSD', fontsize = 15); ax16.set_xlabel('Iterations', fontsize = 15)
ax17.plot(range(len(rmsd)), rmsd['1000_0.1'])
ax17.set_title('Temperature: 1000; cooling rate: 0.1', fontsize = 20)
ax17.set_ylabel('Energy', fontsize = 15); ax17.set_xlabel('Iterations', fontsize = 15)
ax18.plot(range(len(rmsd)), rmsd['1000_0.01'])
ax18.set_title('Temperature: 1000; cooling rate: 0.01', fontsize = 20)
ax18.set_ylabel('RMSD', fontsize = 15); ax18.set_xlabel('Iterations', fontsize = 15)
ax19.plot(range(len(rmsd)), rmsd['1000_0.001'])
ax19.set_title('Temperature: 1000; cooling rate: 0.001', fontsize = 20)
ax19.set_ylabel('RMSD', fontsize = 15); ax19.set_xlabel('Iterations', fontsize = 15)
ax20.plot(range(len(rmsd)), rmsd['1000_0.0001'])
ax20.set_title('Temperature: 1000; cooling rate: 0.0001', fontsize = 20)
ax20.set_ylabel('RMSD', fontsize = 15); ax20.set_xlabel('Iterations', fontsize = 15)
The final RMSD values are as follows
# Final values for each simulation
print("Temperature: 0.1; cooling rate: 0.1--", rmsd['0.1_0.1'][rmsd['0.1_0.1'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.01--", rmsd['0.1_0.01'][rmsd['0.1_0.01'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.00--", rmsd['0.1_0.001'][rmsd['0.1_0.001'].last_valid_index()])
print("Temperature: 0.1; cooling rate: 0.0001--", rmsd['0.1_0.0001'][rmsd['0.1_0.0001'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.1--", rmsd['1_0.1'][rmsd['1_0.1'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01--", rmsd['1_0.01'][rmsd['1_0.01'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001--", rmsd['1_0.001'][rmsd['1_0.001'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.0001--", rmsd['1_0.0001'][rmsd['1_0.0001'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.1--", rmsd['10_0.1'][rmsd['10_0.1'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01--", rmsd['10_0.01'][rmsd['10_0.01'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001--", rmsd['10_0.001'][rmsd['10_0.001'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.0001--", rmsd['10_0.0001'][rmsd['10_0.0001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.1--", rmsd['100_0.1'][rmsd['100_0.1'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01--", rmsd['100_0.01'][rmsd['100_0.01'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001--", rmsd['100_0.001'][rmsd['100_0.001'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.0001--", rmsd['100_0.0001'][rmsd['100_0.0001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.1--", rmsd['1000_0.1'][rmsd['1000_0.1'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01--", rmsd['1000_0.01'][rmsd['1000_0.01'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001--", rmsd['1000_0.001'][rmsd['1000_0.001'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.0001--", rmsd['1000_0.0001'][rmsd['1000_0.0001'].last_valid_index()])
Scatterplots for RMSD and energy values. Columns and rows still represent different cooling rates and temperatures.
f = plt.figure()
f, axes = plt.subplots(nrows = 5, ncols = 4, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['0.1_0.1'], energy['0.1_0.1'], alpha = 0.3)
axes[0][1].scatter(rmsd['0.1_0.01'], energy['0.1_0.01'], alpha = 0.3)
axes[0][2].scatter(rmsd['0.1_0.001'], energy['0.1_0.001'], alpha = 0.3)
axes[0][3].scatter(rmsd['0.1_0.0001'], energy['0.1_0.0001'], alpha = 0.3)
axes[1][0].scatter(rmsd['1_0.1'], energy['1_0.1'], alpha = 0.3)
axes[1][1].scatter(rmsd['1_0.01'], energy['1_0.01'], alpha = 0.3)
axes[1][2].scatter(rmsd['1_0.001'], energy['1_0.001'], alpha = 0.3)
axes[1][3].scatter(rmsd['1_0.0001'], energy['1_0.0001'], alpha = 0.3)
axes[2][0].scatter(rmsd['10_0.1'], energy['10_0.1'], alpha = 0.3)
axes[2][1].scatter(rmsd['10_0.01'], energy['10_0.01'], alpha = 0.3)
axes[2][2].scatter(rmsd['10_0.001'], energy['10_0.001'], alpha = 0.3)
axes[2][3].scatter(rmsd['10_0.0001'], energy['10_0.0001'], alpha = 0.3)
axes[3][0].scatter(rmsd['100_0.1'], energy['100_0.1'], alpha = 0.3)
axes[3][1].scatter(rmsd['100_0.01'], energy['100_0.01'], alpha = 0.3)
axes[3][2].scatter(rmsd['100_0.001'], energy['100_0.001'], alpha = 0.3)
axes[3][3].scatter(rmsd['100_0.0001'], energy['100_0.0001'], alpha = 0.3)
axes[4][0].scatter(rmsd['1000_0.1'], energy['1000_0.1'], alpha = 0.3)
axes[4][1].scatter(rmsd['1000_0.01'], energy['1000_0.01'], alpha = 0.3)
axes[4][2].scatter(rmsd['1000_0.001'], energy['1000_0.001'], alpha = 0.3)
axes[4][3].scatter(rmsd['1000_0.0001'], energy['1000_0.0001'], alpha = 0.3)
plt.show()
Some of these simulations have RMSD values that are close to the native structure. Notably the simulation with a temperature of 10 and a cooling rate of 0.1 (RMSD 0.2 and an energy of -25), both of which are close to the native. However, some of these simulations with low energies and high cooling rates are to be interpreted cautiously, as these simulations will go through few steps and thus naturally stay close to the native conformation.
We know that the fully changed conformation (all angles at 180 degrees) has an RMSD value of around 10. However, what is missing thus far is a sense for how much different an RMSD of 3 is from the native. To get a sense of this, what follows are several structures from simulations that started at the native. Each of these has a different temperature and cooling rate associated with it.
With a temperature of 0.1, a cooling rate of 0.0001 we get an RMSD of 2.3:
Temperature of 10, cooling rate of 0.0001 we get an RMSD of 3.1 (the first replicate):
The second replicate with the same parameters get us an RMSD of 3.3:
Temperature of 100, cooling rate of 0.001 gets us an RMSD of 3.9 (the highest so far).
Increasing the temperature to 1000 results in an RMSD of 3.7:
Increasing the temperature even further to 10,000 (without changing the cooling rate does not seem to have an effect on the RMSD/final structure. The following has an RMSD of 3.1):

For the sake of comparison, starting the simulation at a conformation where all angles are set to 180 degrees returns the following result (RMSD of ~9):

This part of the analysis shows that a conformation that is close to the original alpha helix can be reached with this scoring function if one starts from the folded conformation. The big question is why this conformation is not reached when one starts in the unfolded conformation. This is the question the next section of the analysis will tackle.
One potential reason for the peptide not folding correctly could be that the moves the algorithm is making are too drastic (i.e. large deviations in agles change the conformation too much). Thus, this next section will repeat the monte carlo simulations above twice: once from the disordered state, and once from the native state.
To accomplish this I will find values for the new angles based on a normal distribution centered around the old agle. To prevent angles being set to values bigger than 360 degrees I will take the remainder of the new value after it is divided by 360. The following functions are written in such a way that the standard deviation associated with the normal distribution can be varied from simulation to simulation.
def changeConfirmation2(struc, sd):
# Select a random residue in the structure
res = random.randint(1, struc.total_residue() - 1)
# this variable defines if the phi or the psi angle is being changed
phi_or_psi = random.randint(0, 2)
if phi_or_psi == 0:
oldPhi = helix.phi(res)
newPhi = np.random.normal(oldPhi, sd) % 360
struc.set_phi(res + 1, newPhi)
elif phi_or_psi == 1:
oldPsi = helix.psi(res)
newPsi = np.random.normal(oldPsi, sd) % 360
struc.set_phi(res + 1, newPsi)
else:
oldPhi = helix.phi(res)
newPhi = np.random.normal(oldPhi, sd) % 360
oldPsi = helix.psi(res)
newPsi = np.random.normal(oldPsi, sd) % 360
struc.set_phi(res + 1, newPhi)
struc.set_phi(res + 1, newPsi)
score = scorefxn(helix)
return(score)
def foldIt2(struc, temp, sd):
# Get the current score
current_score = scorefxn(struc)
# Get the current rmsd value
current_rmsd = pyrosetta.rosetta.core.scoring.CA_rmsd(native, helix)
# append both to the respective dataframes
energy_scores.append(current_score)
rmsd_scores.append(current_rmsd)
# Create a deepcopy permutation of the structure that I can then manipulate
temp_helix = Pose()
temp_helix.assign(struc)
# Make a change to the temporary confirmation
changeConfirmation2(temp_helix, sd)
# Calculate the score of the new permutation
new_score = scorefxn(temp_helix)
# If the score is better accept the change
if new_score < current_score:
struc = temp_helix
# If the score is worse only accept the change with a certain probability that is dependent on the magnitude of the
# change. I.e. a bigger change in energy makes it less likely to be accepted
else:
delta = abs(new_score - current_score)
prob = math.exp(-delta/temperature)
test_prob = random.uniform(0, 1)
if(test_prob < prob):
struc = temp_helix
return(struc)
This section will run the folding simulation from the non-folded conformation and make small moves. The standard deviation will be varied from 2 to 15. The scoring function is identical to the one used earlier: all PyRosetta terms, except for the dunbrack term and the weight of the hydrogen bonds in the backbone is set to 20.
# Set up hyperparameters to be searched
temperature_array = [100, 1000]
coolingRate_array = [0.01, 0.001]
stdevs = [2, 5, 10, 15]
native = pose_from_pdb("1gcn.pdb")
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
for k in range(len(stdevs)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
sd = stdevs[k]
expName = str(temperature) + "_" + str(coolingRate) + "_" + str(sd)
names.append(expName)
# Initialize the helix to have angles of 180 degrees again
for k in range(helix.total_residue()):
helix.set_phi(k + 1, 180)
helix.set_psi(k + 1, 180)
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt2(helix, temperature, sd)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
# THis is where the plotting for the above simulation needs to happen
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16) = plt.subplots(16, 1, figsize=(20,100))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(energy)), energy['100_0.01_2'])
ax1.set_title('Temperature: 100; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax1.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['100_0.01_5'])
ax2.set_title('Temperature: 100; cooling rate: 0.01; stdev: 5', fontsize = 20)
ax2.set_ylabel('Energy', fontsize = 15); ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['100_0.01_10'])
ax3.set_title('Temperature: 100; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax3.set_ylabel('Energy', fontsize = 15); ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['100_0.01_15'])
ax4.set_title('Temperature: 100; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax4.set_ylabel('Energy', fontsize = 15); ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['100_0.001_2'])
ax5.set_title('Temperature: 100; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax5.set_ylabel('Energy', fontsize = 15); ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['100_0.001_5'])
ax6.set_title('Temperature: 100; cooling rate: 0.001; stdev: 5', fontsize = 20)
ax6.set_ylabel('Energy', fontsize = 15); ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['100_0.001_10'])
ax7.set_title('Temperature: 100; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax7.set_ylabel('Energy', fontsize = 15); ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['100_0.001_15'])
ax8.set_title('Temperature: 100; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax8.set_ylabel('Energy', fontsize = 15); ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['1000_0.01_2'])
ax9.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax9.set_ylabel('Energy', fontsize = 15); ax9.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(energy)), energy['1000_0.01_5'])
ax10.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 5', fontsize = 20)
ax10.set_ylabel('Energy', fontsize = 15); ax10.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(energy)), energy['1000_0.01_10'])
ax11.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax11.set_ylabel('Energy', fontsize = 15); ax11.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(energy)), energy['1000_0.01_15'])
ax12.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax12.set_ylabel('Energy', fontsize = 15); ax12.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(energy)), energy['1000_0.001_2'])
ax13.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax13.set_ylabel('Energy', fontsize = 15); ax13.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(energy)), energy['1000_0.001_5'])
ax14.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 5', fontsize = 20)
ax14.set_ylabel('Energy', fontsize = 15); ax14.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(energy)), energy['1000_0.001_10'])
ax15.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax15.set_ylabel('Energy', fontsize = 15); ax15.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(energy)), energy['1000_0.001_15'])
ax16.set_title('Temperature: 1000; cooling rate: 0.1', fontsize = 20)
ax16.set_ylabel('Energy', fontsize = 15); ax16.set_xlabel('Iterations', fontsize = 15)
Energies at the end of the simulations
# Final values for each simulation
print("Temperature: 100; cooling rate: 0.01; stdev: 2-- ", energy['100_0.01_2'][energy['100_0.01_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 5-- ", energy['100_0.01_5'][energy['100_0.01_5'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 10-- ", energy['100_0.01_10'][energy['100_0.01_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 15-- ", energy['100_0.01_15'][energy['100_0.01_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 2-- ", energy['100_0.001_2'][energy['100_0.001_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 5-- ", energy['100_0.001_5'][energy['100_0.001_5'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 10-- ", energy['100_0.001_10'][energy['100_0.001_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 15-- ", energy['100_0.001_15'][energy['100_0.001_15'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 2-- ", energy['1000_0.01_2'][energy['1000_0.01_2'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 5-- ", energy['1000_0.01_5'][energy['1000_0.01_5'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 10-- ", energy['1000_0.01_10'][energy['1000_0.01_10'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 15-- ", energy['1000_0.01_15'][energy['1000_0.01_15'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 2-- ", energy['1000_0.001_2'][energy['1000_0.001_2'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 5-- ", energy['1000_0.001_5'][energy['1000_0.001_5'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 10-- ", energy['1000_0.001_10'][energy['1000_0.001_10'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 15-- ", energy['1000_0.001_15'][energy['1000_0.001_15'].last_valid_index()])
RMSD values
# THis is where the plotting for the above simulation needs to happen
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16) = plt.subplots(16, 1, figsize=(20,100))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(rmsd)), rmsd['100_0.01_2'])
ax1.set_title('Temperature: 100; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['100_0.01_5'])
ax2.set_title('Temperature: 100; cooling rate: 0.01; stdev: 5', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15); ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['100_0.01_10'])
ax3.set_title('Temperature: 100; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15); ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['100_0.01_15'])
ax4.set_title('Temperature: 100; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15); ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['100_0.001_2'])
ax5.set_title('Temperature: 100; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15); ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['100_0.001_5'])
ax6.set_title('Temperature: 100; cooling rate: 0.001; stdev: 5', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15); ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['100_0.001_10'])
ax7.set_title('Temperature: 100; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15); ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['100_0.001_15'])
ax8.set_title('Temperature: 100; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15); ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['1000_0.01_2'])
ax9.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15); ax9.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(rmsd)), rmsd['1000_0.01_5'])
ax10.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 5', fontsize = 20)
ax10.set_ylabel('RMSD', fontsize = 15); ax10.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(rmsd)), rmsd['1000_0.01_10'])
ax11.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax11.set_ylabel('RMSD', fontsize = 15); ax11.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(rmsd)), rmsd['1000_0.01_15'])
ax12.set_title('Temperature: 1000; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax12.set_ylabel('RMSD', fontsize = 15); ax12.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(rmsd)), rmsd['1000_0.001_2'])
ax13.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax13.set_ylabel('RMSD', fontsize = 15); ax13.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(rmsd)), rmsd['1000_0.001_5'])
ax14.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 5', fontsize = 20)
ax14.set_ylabel('RMSD', fontsize = 15); ax14.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(rmsd)), rmsd['1000_0.001_10'])
ax15.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax15.set_ylabel('RMSD', fontsize = 15); ax15.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(rmsd)), rmsd['1000_0.001_15'])
ax16.set_title('Temperature: 1000; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax16.set_ylabel('RMSD', fontsize = 15); ax16.set_xlabel('Iterations', fontsize = 15)
Final RMSD values:
# Final values for each simulation
print("Temperature: 100; cooling rate: 0.01; stdev: 2-- ", rmsd['100_0.01_2'][rmsd['100_0.01_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 5-- ", rmsd['100_0.01_5'][rmsd['100_0.01_5'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 10-- ", rmsd['100_0.01_10'][rmsd['100_0.01_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 15-- ", rmsd['100_0.01_15'][rmsd['100_0.01_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 2-- ", rmsd['100_0.001_2'][rmsd['100_0.001_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 5-- ", rmsd['100_0.001_5'][rmsd['100_0.001_5'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 10-- ", rmsd['100_0.001_10'][rmsd['100_0.001_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 15-- ", rmsd['100_0.001_15'][rmsd['100_0.001_15'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 2-- ", rmsd['1000_0.01_2'][rmsd['1000_0.01_2'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 5-- ", rmsd['1000_0.01_5'][rmsd['1000_0.01_5'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 10-- ", rmsd['1000_0.01_10'][rmsd['1000_0.01_10'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.01; stdev: 15-- ", rmsd['1000_0.01_15'][rmsd['1000_0.01_15'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 2-- ", rmsd['1000_0.001_2'][rmsd['1000_0.001_2'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 5-- ", rmsd['1000_0.001_5'][rmsd['1000_0.001_5'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 10-- ", rmsd['1000_0.001_10'][rmsd['1000_0.001_10'].last_valid_index()])
print("Temperature: 1000; cooling rate: 0.001; stdev: 15-- ", rmsd['1000_0.001_15'][rmsd['1000_0.001_15'].last_valid_index()])
f = plt.figure()
f, axes = plt.subplots(nrows = 4, ncols = 4, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['100_0.01_2'], energy['100_0.01_2'], alpha = 0.3)
axes[0][1].scatter(rmsd['100_0.01_5'], energy['100_0.01_5'], alpha = 0.3)
axes[0][2].scatter(rmsd['100_0.01_10'], energy['100_0.01_10'], alpha = 0.3)
axes[0][3].scatter(rmsd['100_0.01_15'], energy['100_0.01_15'], alpha = 0.3)
axes[1][0].scatter(rmsd['100_0.001_2'], energy['100_0.001_2'], alpha = 0.3)
axes[1][1].scatter(rmsd['100_0.001_5'], energy['100_0.001_5'], alpha = 0.3)
axes[1][2].scatter(rmsd['100_0.001_10'], energy['100_0.001_10'], alpha = 0.3)
axes[1][3].scatter(rmsd['100_0.001_15'], energy['100_0.001_15'], alpha = 0.3)
axes[2][0].scatter(rmsd['1000_0.01_2'], energy['1000_0.01_2'], alpha = 0.3)
axes[2][1].scatter(rmsd['1000_0.01_5'], energy['1000_0.01_5'], alpha = 0.3)
axes[2][2].scatter(rmsd['1000_0.01_10'], energy['1000_0.01_10'], alpha = 0.3)
axes[2][3].scatter(rmsd['1000_0.01_15'], energy['1000_0.01_15'], alpha = 0.3)
axes[3][0].scatter(rmsd['1000_0.001_2'], energy['1000_0.001_2'], alpha = 0.3)
axes[3][1].scatter(rmsd['1000_0.001_5'], energy['1000_0.001_5'], alpha = 0.3)
axes[3][2].scatter(rmsd['1000_0.001_10'], energy['1000_0.001_10'], alpha = 0.3)
axes[3][3].scatter(rmsd['1000_0.001_15'], energy['1000_0.001_15'], alpha = 0.3)
plt.show()
Based on both the energy values as well as the RMSD values, we can see that making smaller moves is actually worse than making larger moves, as the final valus for both the energy and RMSD terms are higher than if random moves were made.
Next I will perform the same simulation as above, the only difference will be that I will start in the folded conformation. Small angles will be picked in the same way, and the same scoring function will be used.
# Set up hyperparameters to be searched
temperature_array = [1, 10, 100]
coolingRate_array = [0.01, 0.001]
stdevs = [2, 10, 15]
# And the corresponding data frames
energy = pd.DataFrame()
rmsd = pd.DataFrame()
names = [] # this will store the name of the experiment
for i in range(len(temperature_array)):
for j in range(len(coolingRate_array)):
for k in range(len(stdevs)):
# Set up lists to track the results
energy_scores = []
rmsd_scores = []
temperature = temperature_array[i]
coolingRate = coolingRate_array[j]
sd = stdevs[k]
expName = str(temperature) + "_" + str(coolingRate) + "_" + str(sd)
names.append(expName)
# Return to native structure
helix = pose_from_pdb("1gcn.pdb")
while temperature > 0.005: # this sets the minimum temperature
helix = foldIt2(helix, temperature, sd)
# reduce the temperature
temperature = temperature * (1 - coolingRate)
# Convert results into series so that they can be concatenated to a pandas dataframe
energy_scores = pd.Series(energy_scores)
rmsd_scores = pd.Series(rmsd_scores)
# Concatenate results to a pandas dataframe
energy = pd.concat([energy, energy_scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
rmsd.columns = names
energy.columns = names
Energy values for the above simulation
# THis is where the plotting for the above simulation needs to happen
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16, ax17, ax18) = plt.subplots(18, 1, figsize=(20,120))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(energy)), energy['1_0.01_2'])
ax1.set_title('Temperature: 1; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax1.set_ylabel('Energy', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(energy)), energy['1_0.01_10'])
ax2.set_title('Temperature: 1; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax2.set_ylabel('Energy', fontsize = 15); ax2.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(energy)), energy['1_0.01_15'])
ax3.set_title('Temperature: 1; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax3.set_ylabel('Energy', fontsize = 15); ax3.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(energy)), energy['1_0.001_2'])
ax4.set_title('Temperature: 1; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax4.set_ylabel('Energy', fontsize = 15); ax4.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(energy)), energy['1_0.001_10'])
ax5.set_title('Temperature: 1; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax5.set_ylabel('Energy', fontsize = 15); ax5.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(energy)), energy['1_0.001_15'])
ax6.set_title('Temperature: 1; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax6.set_ylabel('Energy', fontsize = 15); ax6.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(energy)), energy['10_0.01_2'])
ax7.set_title('Temperature: 10; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax7.set_ylabel('Energy', fontsize = 15); ax7.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(energy)), energy['10_0.01_10'])
ax8.set_title('Temperature: 10; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax8.set_ylabel('Energy', fontsize = 15); ax8.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(energy)), energy['10_0.01_15'])
ax9.set_title('Temperature: 10; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax9.set_ylabel('Energy', fontsize = 15); ax9.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(energy)), energy['10_0.001_2'])
ax10.set_title('Temperature: 10; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax10.set_ylabel('Energy', fontsize = 15); ax10.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(energy)), energy['10_0.001_10'])
ax11.set_title('Temperature: 10; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax11.set_ylabel('Energy', fontsize = 15); ax11.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(energy)), energy['10_0.001_15'])
ax12.set_title('Temperature: 10; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax12.set_ylabel('Energy', fontsize = 15); ax12.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(energy)), energy['100_0.01_2'])
ax13.set_title('Temperature: 100; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax13.set_ylabel('Energy', fontsize = 15); ax13.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(energy)), energy['100_0.01_10'])
ax14.set_title('Temperature: 100; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax14.set_ylabel('Energy', fontsize = 15); ax14.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(energy)), energy['100_0.01_15'])
ax15.set_title('Temperature: 100; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax15.set_ylabel('Energy', fontsize = 15); ax15.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(energy)), energy['100_0.001_2'])
ax16.set_title('Temperature: 100; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax16.set_ylabel('Energy', fontsize = 15); ax16.set_xlabel('Iterations', fontsize = 15)
ax17.plot(range(len(energy)), energy['100_0.001_10'])
ax17.set_title('Temperature: 100; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax17.set_ylabel('Energy', fontsize = 15); ax17.set_xlabel('Iterations', fontsize = 15)
ax18.plot(range(len(energy)), energy['100_0.001_15'])
ax18.set_title('Temperature: 100; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax18.set_ylabel('Energy', fontsize = 15); ax18.set_xlabel('Iterations', fontsize = 15)
Final energy values for each simulation:
# Final values for each simulation
print("Temperature: 1; cooling rate: 0.01; stdev: 2-- ", energy['1_0.01_2'][energy['1_0.01_2'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01; stdev: 10-- ", energy['1_0.01_10'][energy['1_0.01_10'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01; stdev: 15-- ", energy['1_0.01_15'][energy['1_0.01_15'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 2-- ", energy['1_0.001_2'][energy['1_0.001_2'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 10-- ", energy['1_0.001_10'][energy['1_0.001_10'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 15-- ", energy['1_0.001_15'][energy['1_0.001_15'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 2-- ", energy['10_0.01_2'][energy['10_0.01_2'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 10-- ", energy['10_0.01_10'][energy['10_0.01_10'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 15-- ", energy['10_0.01_15'][energy['10_0.01_15'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 2-- ", energy['10_0.001_2'][energy['10_0.001_2'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 10-- ", energy['10_0.001_10'][energy['10_0.001_10'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 15-- ", energy['10_0.001_15'][energy['10_0.001_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 2-- ", energy['100_0.01_2'][energy['100_0.01_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 10-- ", energy['100_0.01_10'][energy['100_0.01_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 15-- ", energy['100_0.01_15'][energy['100_0.01_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 2-- ", energy['100_0.001_2'][energy['100_0.001_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 10-- ", energy['100_0.001_10'][energy['100_0.001_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 15-- ", energy['100_0.001_15'][energy['100_0.001_15'].last_valid_index()])
fig, (ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16, ax17, ax18) = plt.subplots(18, 1, figsize=(20,60))
#fig.suptitle('Horizontally stacked subplots')
ax1.plot(range(len(rmsd)), rmsd['1_0.01_2'])
ax1.set_title('Temperature: 1; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax1.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax2.plot(range(len(rmsd)), rmsd['1_0.01_10'])
ax2.set_title('Temperature: 1; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax2.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax3.plot(range(len(rmsd)), rmsd['1_0.01_15'])
ax3.set_title('Temperature: 1; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax3.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax4.plot(range(len(rmsd)), rmsd['1_0.001_2'])
ax4.set_title('Temperature: 1; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax4.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax5.plot(range(len(rmsd)), rmsd['1_0.001_10'])
ax5.set_title('Temperature: 1; cooling rate: 0.001; stdev: 5', fontsize = 20)
ax5.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax6.plot(range(len(rmsd)), rmsd['1_0.001_15'])
ax6.set_title('Temperature: 1; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax6.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax7.plot(range(len(rmsd)), rmsd['10_0.01_2'])
ax7.set_title('Temperature: 10; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax7.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax8.plot(range(len(rmsd)), rmsd['10_0.01_10'])
ax8.set_title('Temperature: 10; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax8.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax9.plot(range(len(rmsd)), rmsd['10_0.01_15'])
ax9.set_title('Temperature: 10; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax9.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax10.plot(range(len(rmsd)), rmsd['10_0.001_2'])
ax10.set_title('Temperature: 10; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax10.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax11.plot(range(len(rmsd)), rmsd['10_0.001_10'])
ax11.set_title('Temperature: 10; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax11.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax12.plot(range(len(rmsd)), rmsd['10_0.001_15'])
ax12.set_title('Temperature: 10; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax12.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax13.plot(range(len(rmsd)), rmsd['100_0.01_2'])
ax13.set_title('Temperature: 100; cooling rate: 0.01; stdev: 2', fontsize = 20)
ax13.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax14.plot(range(len(rmsd)), rmsd['100_0.01_10'])
ax14.set_title('Temperature: 100; cooling rate: 0.01; stdev: 10', fontsize = 20)
ax14.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax15.plot(range(len(rmsd)), rmsd['100_0.01_15'])
ax15.set_title('Temperature: 100; cooling rate: 0.01; stdev: 15', fontsize = 20)
ax15.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax16.plot(range(len(rmsd)), rmsd['100_0.001_2'])
ax16.set_title('Temperature: 100; cooling rate: 0.001; stdev: 2', fontsize = 20)
ax16.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax17.plot(range(len(rmsd)), rmsd['100_0.001_10'])
ax17.set_title('Temperature: 100; cooling rate: 0.001; stdev: 10', fontsize = 20)
ax17.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
ax18.plot(range(len(rmsd)), rmsd['100_0.001_15'])
ax18.set_title('Temperature: 100; cooling rate: 0.001; stdev: 15', fontsize = 20)
ax18.set_ylabel('RMSD', fontsize = 15); ax1.set_xlabel('Iterations', fontsize = 15)
Final RMSD values
# Final values for each simulation
print("Temperature: 1; cooling rate: 0.01; stdev: 2-- ", rmsd['1_0.01_2'][rmsd['1_0.01_2'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01; stdev: 10-- ", rmsd['1_0.01_10'][rmsd['1_0.01_10'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.01; stdev: 15-- ", rmsd['1_0.01_15'][rmsd['1_0.01_15'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 2-- ", rmsd['1_0.001_2'][rmsd['1_0.001_2'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 10-- ", rmsd['1_0.001_10'][rmsd['1_0.001_10'].last_valid_index()])
print("Temperature: 1; cooling rate: 0.001; stdev: 15-- ", rmsd['1_0.001_15'][rmsd['1_0.001_15'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 2-- ", rmsd['10_0.01_2'][rmsd['10_0.01_2'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 10-- ", rmsd['10_0.01_10'][rmsd['10_0.01_10'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.01; stdev: 15-- ", rmsd['10_0.01_15'][rmsd['10_0.01_15'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 2-- ", rmsd['10_0.001_2'][rmsd['10_0.001_2'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 10-- ", rmsd['10_0.001_10'][rmsd['10_0.001_10'].last_valid_index()])
print("Temperature: 10; cooling rate: 0.001; stdev: 15-- ", rmsd['10_0.001_15'][rmsd['10_0.001_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 2-- ", rmsd['100_0.01_2'][rmsd['100_0.01_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 10-- ", rmsd['100_0.01_10'][rmsd['100_0.01_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.01; stdev: 15-- ", rmsd['100_0.01_15'][rmsd['100_0.01_15'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 2-- ", rmsd['100_0.001_2'][rmsd['100_0.001_2'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 10-- ", rmsd['100_0.001_10'][rmsd['100_0.001_10'].last_valid_index()])
print("Temperature: 100; cooling rate: 0.001; stdev: 15-- ", rmsd['100_0.001_15'][rmsd['100_0.001_15'].last_valid_index()])
f = plt.figure()
f, axes = plt.subplots(nrows = 6, ncols = 3, figsize = (20, 20))
sc = axes[0][0].scatter(rmsd['1_0.01_2'], energy['1_0.01_2'], alpha = 0.3)
axes[0][1].scatter(rmsd['1_0.01_10'], energy['1_0.01_10'], alpha = 0.3)
axes[0][2].scatter(rmsd['1_0.01_15'], energy['1_0.01_15'], alpha = 0.3)
axes[1][0].scatter(rmsd['1_0.001_2'], energy['1_0.001_2'], alpha = 0.3)
axes[1][1].scatter(rmsd['1_0.001_10'], energy['1_0.001_10'], alpha = 0.3)
axes[1][2].scatter(rmsd['1_0.001_15'], energy['1_0.001_15'], alpha = 0.3)
axes[2][0].scatter(rmsd['10_0.01_2'], energy['10_0.01_2'], alpha = 0.3)
axes[2][1].scatter(rmsd['10_0.01_10'], energy['10_0.01_10'], alpha = 0.3)
axes[2][2].scatter(rmsd['10_0.01_15'], energy['10_0.01_15'], alpha = 0.3)
axes[3][0].scatter(rmsd['10_0.001_2'], energy['10_0.001_2'], alpha = 0.3)
axes[3][1].scatter(rmsd['10_0.001_10'], energy['10_0.001_10'], alpha = 0.3)
axes[3][2].scatter(rmsd['10_0.001_15'], energy['10_0.001_15'], alpha = 0.3)
axes[4][0].scatter(rmsd['100_0.01_2'], energy['100_0.01_2'], alpha = 0.3)
axes[4][1].scatter(rmsd['100_0.01_10'], energy['100_0.01_10'], alpha = 0.3)
axes[4][2].scatter(rmsd['100_0.01_15'], energy['100_0.01_15'], alpha = 0.3)
axes[5][0].scatter(rmsd['100_0.001_2'], energy['100_0.001_2'], alpha = 0.3)
axes[5][1].scatter(rmsd['100_0.001_10'], energy['100_0.001_10'], alpha = 0.3)
axes[5][2].scatter(rmsd['100_0.001_15'], energy['100_0.001_15'], alpha = 0.3)
plt.show()
Unsurprisingly, in this case we still get RMSD values that are close to the native structure. This is because making small moves when starting at the native will likely result in little change to the conformation.
One final possibility is that the algorithm only finds the correct conformation in a subset of the cases. To address this I will run the simulation 100 times and stop it once the RMSD score is less than 3.5.
### Run until close
helix = pose_from_pdb("1gcn.pdb")
results = pd.DataFrame()
rmsd = pd.DataFrame()
names = []
native = Pose()
native.assign(helix)
for k in range(100):
temperature = 1
cooling = 0.001
expName = str(temperature) + "_" + str(cooling)
names.append(expName)
scores = []
rmsd_scores = []
# Scrable up the structure
for k in range(helix.total_residue()):
angle1 = random.randint(0,360)
angle2 = random.randint(0,360)
helix.set_phi(k + 1, angle1)
helix.set_psi(k + 1, angle2)
while temperature > 0.005:
# Calculate the current score
current_score = scorefxn(helix)
current_rmsd = pyrosetta.rosetta.core.scoring.CA_rmsd(native, helix)
if current_rmsd < 3.5:
print("Reached something similar")
break
scores.append(current_score)
rmsd_scores.append(current_rmsd)
# Create a deepcopy permutation
temp_helix = Pose()
temp_helix.assign(helix)
# Make a change to the temporary confirmation
changeConfirmation(temp_helix)
# Calculate the score of the new permutation
new_score = scorefxn(temp_helix)
if new_score < current_score:
helix = temp_helix
else:
delta = abs(new_score - current_score)
prob = math.exp(-delta/temperature)
test_prob = random.uniform(0, 1)
if(test_prob < prob):
helix = temp_helix
temperature = temperature * (1 - cooling)
scores = pd.Series(scores)
rmsd_scores = pd.Series(rmsd_scores)
results = pd.concat([results, scores], ignore_index = True, axis = 1)
rmsd = pd.concat([rmsd, rmsd_scores], ignore_index = True, axis = 1)
print(scorefxn.show(helix))
#print(len(scores))
if current_rmsd < 3.5:
break
results.columns = names
view = nv.show_rosetta(helix)
view
The above simulation ran 100 times, yet it did not find a structure that was close in terms of RMSD value (i.e. less than 3.5). The structure found by the last simulation looked as follows:

Regardless of the energy function used to fold the protein, the native conformation could not be found in any simulation. This could be due to a number of things. First, the structure of the glucagon helix used here was determined via X-ray crystallography and was found to take on a helix when in a crystal lattice. Given that I am not modelling the lattice, this might be where some discrepancies may arise. Additionally, the simulation does not contain any water molecules that might help it fold if it were in solution. Second, the Rosetta scoring function was not designed for folding such short peptides, making this another potential source of errors.
Nonetheless, the scoring function used can fold the protein somewhat accurately if the simulation is started from the native conformation. Given that this is possible even at very high temperatures could indicate that the energy landscape is somewhat flat everywhere and the minimum is only reachable via a narrow entrace (depicted below).
A two dimensional represenation of the multidimensional energy landscape probably looks something like the following:

This final project has been a learning experience in many ways.
Specifically, I learned how to use PyRosetta, the state of the art protein modelling software used in both academia and industry for protein design and protein folding simulations. This may also become a really good selling point as I apply for graduate schools and labs that work in the protein design space. Additionally it has also allowed me to understand the file formats associated with protein structures (PDB formats), and how individual atoms are represented to show an entire protein.
Additionally, I learned new biochemistry concepts, starting with the Lennard-Jones potential energy model to score proteins to the more advanced methods available in PyRosetta. In speaking with Phil Bradley I also learned more about X-ray crystallography and how protein structures are obtained as well as some intricacies of thermodynamic processes.
I've also learned about visualizing figures in Python. Specifically about matplotlib and how to plot graphs in python, as well as nglview to visualize actual protein structures.
Finally, installing all of these libraries took me a month. Through this gruelsome process I learned how to install a linux subsystem and to never get a windows machine again.
Overall, I am hoping that this project will help me professionally too. As mentioned it is a very good selling point for getting into specific labs as well as potential jobs. In fact, I recently had a meeting with a bioinformatician at Amazon and she asked me if I could present on this as part of a recruiting process. So if grad school plans do not work out then I may end up working for Amazon, in part because of this final project.
Wikipedia
https://en.wikipedia.org/wiki/Ramachandran_plot#/media/File:Protein_backbone_PhiPsiOmega_drawing.svg